14 research outputs found

    A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

    Full text link
    Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; excessive numbers of deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmantation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a brief summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.Comment: Updated with new studie

    A Hybrid Framework for Matching Printing Design Files to Product Photos

    Get PDF
    We propose a real-time image matching framework, which is hybrid in the sense that it uses both hand-crafted features and deep features obtained from a well-tuned deep convolutional network. The matching problem, which we concentrate on, is specific to a certain application, that is, printing design to product photo matching. Printing designs are any kind of template image files, created using a design tool, thus are perfect image signals. However, photographs of a printed product suffer many unwanted effects, such as uncontrolled shooting angle, uncontrolled illumination, occlusions, printing deficiencies in color, camera noise, optic blur, et cetera. For this purpose, we create an image set that includes printing design and corresponding product photo pairs with collaboration of an actual printing facility. Using this image set, we benchmark various hand-crafted and deep features for matching performance and propose a framework in which deep learning is utilized with highest contribution, but without disabling real-time operation using an ordinary desktop computer

    Filter design for small target detection on infrared imagery using normalized-cross-correlation layer

    Get PDF
    In this paper, we introduce a machine learning approach to the problem of infrared small target detection filter design. For this purpose, similarly to a convolutional layer of a neural network, the normalized-cross-correlational (NCC) layer, which we utilize for designing a target detection/recognition filter bank, is proposed. By employing the NCC layer in a neural network structure, we introduce a framework, in which supervised training is used to calculate the optimal filter shape and the optimum number of filters required for a specific target detection/recognition task on infrared images. We also propose the mean-absolute-deviation NCC (MAD-NCC) layer, an efficient implementation of the proposed NCC layer, designed especially for FPGA systems, in which square root operations are avoided for real-time computation. As a case study we work on dim-target detection on mid-wave infrared imagery and obtain the filters that can discriminate a dim target from various types of background clutter, specific to our operational concept

    Defining Image Memorability using the Visual Memory Schema

    Get PDF
    Memorability of an image is a characteristic determined by the human observers’ ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers

    SCALE-SPACE APPROACH FOR THE COMPARISON OF HK AND SC CURVATURE DESCRIPTIONS AS APPLIED TO OBJECT RECOGNITION

    No full text
    Using mean curvature (H) and Gaussian curvature (K) values or shape index (S) and curvedness (C) values, HK and SC curvature spaces are constructed in order to classify surface patches into types such as pits, peaks, saddles etc. Since both HK and SC curvature spaces classify surface patches in to similar types, their classification capabilities are comparable. Previously, HK and SC curvature spaces were compared in terms of their classification ability only at the given data resolution [2]. When calculating H. K, C and S values, the scale/resolution ratio is highly effective. However, due to its scale invariant nature, shape index (S) values are independent of the resolution or the scale. Thus it is no wonder that SC method gives better results than HK method when the comparison is carried out at an uncontrolled scale/resolution level. In this study, the scale/resolution ratio is set to a constant value for the whole database and scale spaces based on both HK and SC methods are built. Scale and orientation invariant features are extracted using scale spaces and these features are used in object recognition tasks. The methods are compared both mathematically and experimentally in terms of their surface classification and object recognition performances

    Simulation of Turkish lip motion and facial expressions in a 3D environment and synchronization with a Turkish speech engine

    No full text
    In this thesis, 3D animation of human facial expressions and lip motion and their synchronization with a Turkish Speech engine using JAVA programming language, JAVA3D API and Java Speech API, is analyzed. A three-dimensional animation model for simulating Turkish lip motion and facial expressions is developed. In addition to lip motion, synchronization with a Turkish speech engine is achieved. The output of the study is facial expressions and Turkish lip motion synchronized with Turkish speech, where the input is Turkish text in Java Speech Markup Language (JSML) format, also indicating expressions. The animation is created using JAVA3D API. 3D facial models corresponding to different lip positions of the same person are morphed to each other to construct the animation. Moreover, simulations of human facial expressions of emotions are created within the animation. Expression weight parameter, which states the weight of the given expression, is introduced. The synchronization of lip motion with Turkish speech is achieved via CloudGarden(R)'s Java Speech API interface [2]. "Levent16k SAPI 4-5 Male Voice" of G-V.S Voice Technologies Software Firm was used for Turkish speech engine [3]. As a final point a virtual Turkish speaker with facial expression of emotions is created for JAVA3D animation

    3D Data Processing for Enhancement of Face Scanner Data

    No full text
    The data acquired by 3D face scanners have distortions such as spikes, holes and noise. Enhancement of 3D face data by removing these distortions while keeping the face features is important for the applications using these data. In this study, thresholding is used for removing spikes, thresholding together with face symmetry, is used for hole filling and bilateral filtering is used for smoothing and satisfactory results are obtained on FRGC 3D face data

    3D face modeling using multiple images

    No full text
    3D face modeling based on real images is one of the important subject of Computer Vision that is studied recently. In this paper the study that eve contucted in our Computer Vision and Intelligent Systems Research Laboratory on 3D face model generation using uncalibrated multiple still images is explained

    Scale invariant representation of 2 5D data

    No full text
    In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pitness) measure is defined based on Gaussian curvature calculation, which is performed at various scales on the surface. Finally a "position vs. scale" feature volume is obtained and the graph nodes are extracted from this feature space by volume segmentation techniques
    corecore